Summarizing Two-Dimensional Data with Skyline-Based Statistical Descriptors
نویسندگان
چکیده
Much real data consists of more than one dimension, such as financial transactions (eg, price × volume) and IP network flows (eg, duration × numBytes), and capture relationships between the variables. For a single dimension, quantiles are intuitive and robust descriptors. Processing and analyzing such data, particularly in data warehouse or data streaming settings, requires similarly robust and informative statistical descriptors that go beyond one-dimension. Applying quantile methods to summarize a multidimensional distribution along only singleton attributes ignores the rich dependence amongst the variables. In this paper, we present new skyline-based statistical descriptors for capturing the distributions over pairs of dimensions. They generalize the notion of quantiles in the individual dimensions, and also incorporate properties of the joint distribution. We introduce φ-quantours and α-radials, which are skyline points over subsets of the data, and propose (φ, α)-quantiles, found from the union of these skylines, as statistical descriptors of two-dimensional distributions. We present efficient online algorithms for tracking (φ, α)-quantiles on two-dimensional streams using guaranteed small space. We identify the principal properties of the proposed descriptors and perform extensive experiments with synthetic and real IP traffic data to study the efficiency of our proposed algorithms.
منابع مشابه
A New Approach for Optimization of Dynamic Metric Access Methods Using an Algorithm of Effective Deletion
New Challenges in Petascale Scientific Databases p. 1 Adventures in the Blogosphere p. 2 The Evolution of Vertical Database Architectures A Historical Review p. 3 Query Optimization in Scientific Databases Linked Bernoulli Synopses: Sampling along Foreign Keys p. 6 Query Planning for Searching Inter-dependent Deep-Web Databases p. 24 Summarizing Two-Dimensional Data with Skyline-Based Statistic...
متن کاملGeometry-Based Distributed Spatial Skyline Queries in Wireless Sensor Networks
Algorithms for skyline querying based on wireless sensor networks (WSNs) have been widely used in the field of environmental monitoring. Because of the multi-dimensional nature of the problem of monitoring spatial position, traditional skyline query strategies cause enormous computational costs and energy consumption. To ensure the efficient use of sensor energy, a geometry-based distributed sp...
متن کاملOn High Dimensional Skylines
In many decision-making applications, the skyline query is frequently used to find a set of dominating data points (called skyline points) in a multidimensional dataset. In a high-dimensional space skyline points no longer offer any interesting insights as there are too many of them. In this paper, we introduce a novel metric, called skyline frequency that compares and ranks the interestingness...
متن کاملThe MPEG-7 Standard and the Content-Based Management of Three-Dimensional Data: A Case Study
Content-based management of three-dimensional data in the framework of the MPEG-7 standard is considered. An overview on MPEG-7 and three-dimensional data is presented. Descriptors and description schemes suitable for three-dimensional data are introduced and their integration in the framework of MPEG-7 is detailed. The shape descriptors are based on cords, wavelet transform and three-dimension...
متن کاملSkyDist: Data Mining on Skyline Objects
The skyline operator is a well established database primitive which is traditionally applied in a way that only a single skyline is computed. In this paper we use multiple skylines themselves as objects for data exploration and data mining. We define a novel similarity measure for comparing different skylines, called SkyDist. SkyDist can be used for complex analysis tasks such as clustering, cl...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008